12 research outputs found
Interpreting Neural Policies with Disentangled Tree Representations
The advancement of robots, particularly those functioning in complex
human-centric environments, relies on control solutions that are driven by
machine learning. Understanding how learning-based controllers make decisions
is crucial since robots are often safety-critical systems. This urges a formal
and quantitative understanding of the explanatory factors in the
interpretability of robot learning. In this paper, we aim to study
interpretability of compact neural policies through the lens of disentangled
representation. We leverage decision trees to obtain factors of variation [1]
for disentanglement in robot learning; these encapsulate skills, behaviors, or
strategies toward solving tasks. To assess how well networks uncover the
underlying task dynamics, we introduce interpretability metrics that measure
disentanglement of learned neural dynamics from a concentration of decisions,
mutual information and modularity perspective. We showcase the effectiveness of
the connection between interpretability and disentanglement consistently across
extensive experimental analysis
Solving Continuous Control via Q-learning
While there has been substantial success for solving continuous control with
actor-critic methods, simpler critic-only methods such as Q-learning find
limited application in the associated high-dimensional action spaces. However,
most actor-critic methods come at the cost of added complexity: heuristics for
stabilisation, compute requirements and wider hyperparameter search spaces. We
show that a simple modification of deep Q-learning largely alleviates these
issues. By combining bang-bang action discretization with value decomposition,
framing single-agent control as cooperative multi-agent reinforcement learning
(MARL), this simple critic-only approach matches performance of
state-of-the-art continuous actor-critic methods when learning from features or
pixels. We extend classical bandit examples from cooperative MARL to provide
intuition for how decoupled critics leverage state information to coordinate
joint optimization, and demonstrate surprisingly strong performance across a
variety of continuous control tasks
Learning to Plan via Deep Optimistic Value Exploration
Deep exploration requires coordinated long-term planning. We present a model-based reinforcement learning algorithm that guides policy learning through a value function that exhibits optimism in the face of uncertainty. We capture uncertainty over values by combining predictions from an ensemble of models and formulate an upper confidence bound (UCB) objective to recover optimistic estimates. Training the policy on ensemble rollouts with the learned value function as the terminal cost allows for projecting long-term interactions into a limited planning horizon, thus enabling deep optimistic exploration. We do not assume a priori knowledge of either the dynamics or reward function. We demonstrate that our approach can accommodate both dense and sparse reward signals, while improving sample complexity on a variety of benchmarking tasks. Keywords: Reinforcement Learning; Deep Exploration; Model-Based; Value Function; UCBOffice of Naval Research; Qualcomm; Toyota Research Institut
Locomotion Planning through a Hybrid Bayesian Trajectory Optimization
Locomotion planning for legged systems requires
reasoning about suitable contact schedules. The contact se-
quence and timings constitute a hybrid dynamical system
and prescribe a subset of achievable motions. State-of-the-
art approaches cast motion planning as an optimal control
problem. In order to decrease computational complexity, one
common strategy separates footstep planning from motion
optimization and plans contacts using heuristics. In this paper,
we propose to learn contact schedule selection from high-
level task descriptors using Bayesian Optimization. A bi-level
optimization is defined in which a Gaussian Process model
predicts the performance of trajectories generated by a motion
planning nonlinear program. The agent, therefore, retains the
ability to reason about suitable contact schedules, while explicit
computation of the corresponding gradients is avoided. We
delineate the algorithm in its general form and provide results
for planning single-legged hopping. Our method is capable of
learning contact schedule transitions that align with human
intuition. It performs competitively against a heuristic baseline
in predicting task appropriate contact schedules
Inclusion of Angular Momentum During Planning for Capture Point Based Walking
When walking at high speeds, the swing legs of
robots produce a non-negligible angular momentum rate. To accommodate this, we provide a reference trajectory generator for bipedal walking that incorporates predicted centroidal angular momentum at the planning stage. This can be done efficiently as the Centroidal Moment Pivot (CMP), Instantaneous Capture Point (ICP) and the center of mass (CoM) all have closedform trajectory solutions due to their linear dynamics. This is then used to produce smooth, continuous trajectories. We furthermore provide a lightweight model to estimate angular momentum as induced during leg swing of the gait cycle. Our proposed trajectory generator is tested thoroughly in simulation and has been shown to successfully operate on the real hardware
Neighborhood Mixup Experience Replay: Local Convex Interpolation for Improved Sample Efficiency in Continuous Control Tasks
Experience replay plays a crucial role in improving the sample efficiency of
deep reinforcement learning agents. Recent advances in experience replay
propose using Mixup (Zhang et al., 2018) to further improve sample efficiency
via synthetic sample generation. We build upon this technique with Neighborhood
Mixup Experience Replay (NMER), a geometrically-grounded replay buffer that
interpolates transitions with their closest neighbors in state-action space.
NMER preserves a locally linear approximation of the transition manifold by
only applying Mixup between transitions with vicinal state-action features.
Under NMER, a given transition's set of state action neighbors is dynamic and
episode agnostic, in turn encouraging greater policy generalizability via
inter-episode interpolation. We combine our approach with recent off-policy
deep reinforcement learning algorithms and evaluate on continuous control
environments. We observe that NMER improves sample efficiency by an average 94%
(TD3) and 29% (SAC) over baseline replay buffers, enabling agents to
effectively recombine previous experiences and learn from limited data.Comment: Accepted to L4DC 202
Inclusion of Angular Momentum During Planning for Capture Point Based Walking
When walking at high speeds, the swing legs of
robots produce a non-negligible angular momentum rate. To accommodate this, we provide a reference trajectory generator for bipedal walking that incorporates predicted centroidal angular momentum at the planning stage. This can be done efficiently as the Centroidal Moment Pivot (CMP), Instantaneous Capture Point (ICP) and the center of mass (CoM) all have closedform trajectory solutions due to their linear dynamics. This is then used to produce smooth, continuous trajectories. We furthermore provide a lightweight model to estimate angular momentum as induced during leg swing of the gait cycle. Our proposed trajectory generator is tested thoroughly in simulation and has been shown to successfully operate on the real hardware
Good Posture, Good Balance: Comparison of bio-inspired and model-based approaches for posture control of humanoid robots
This article provides a theoretical and thorough experimental comparison of two distinct posture control approaches: 1) a fully model-based control approach
and 2) a biologically inspired Approach derived from human observations. While the robotic approach can easily be applied to balancing in three-dimensional (3-D) and multicontact (MC) situations, the biologically inspired balancer currently only works in two-dimensional situations but shows interesting robustness properties under time
delays in the feedback loop. This is an important
feature when considering the signal transmission
and processing properties in the human sensorimotor
system. Both controllers were evaluated in a series
of experiments with a torque-controlled humanoid robot (TORO). This article concludes with some suggestions for the improvement of model-based balancing approaches in robotics